Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU
نویسنده
چکیده
In this technical report we study different parallel graph coloring algorithms and their application to the incomplete-LU factorization. We implement graph coloring based on different heuristics and showcase their performance on the GPU. We also present a comprehensive comparison of level-scheduling and graph coloring approaches for the incomplete-LU factorization and triangular solve. We discuss their tradeoffs and differences from the mathematics and computer science prospective. Finally we present numerical experiments that showcase the performance of both algorithms. In particular, we show that incomplete-LU factorization based on graph coloring can achieve a speedup of almost 8× on the GPU over the reference MKL implementation on the CPU.
منابع مشابه
Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU
A novel algorithm for computing the incomplete-LU and Cholesky factorization with 0 fill-in on a graphics processing unit (GPU) is proposed. It implements the incomplete factorization of the given matrix in two phases. First, the symbolic analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the numerical factorization...
متن کاملParallel Triangular Solvers on GPU
In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new matrix format suitable for GPU devices. Parallel lower triangular solvers and upper triangular solvers are developed for this new data structure. With these solv...
متن کاملParallel Iterative Solution of Sparse Linear Systems Using Orderings from Graph Coloring Heuristics
The eeciency of a parallel implementation of the conjugate gradient method precon-ditioned by an incomplete Cholesky factorization can vary dramatically depending on the column ordering chosen. One method to minimize the number of major parallel steps is to choose an ordering based on a coloring of the symmetric graph representing the nonzero adjacency structure of the matrix. In this paper, we...
متن کاملLevel-based Incomplete LU Factorization: Graph Model and Algorithms
A graph theoretic process that models level-based, incomplete LU factorization (ILU(`)) of sparse unsymmetric matrices is developed. The model leads to two incomplete fill path theorems that are generalizations of the original fill path theorem of Rose, Tarjan, and Lueker. Our S-level incomplete fill path theorem leads to the development of new, embarrassingly parallel algorithms for computing ...
متن کاملAcceleration of Turbomachinery Steady Simulations on GPU
Steady state simulations in Computational Fluid Dynamics (CFD), which rely on implicit time integration, are not experiencing great accelerations on GPUs. Moreover, most of the reported acceleration effort concerns solving the linear system of equations while neglecting the acceleration potential of running the entire simulation on the GPU. In this paper, we present the software implementation ...
متن کامل